Introduction

Idea

The idea of the analysis of the datasets smashy_lcbench and smashy_super is to understand the dependencies between the hyperparameters and the target variable yval, using the implemented plots from the VisHyp package and, most importantly, without the help of any automatic optimization. We want to understand which parameter is important, i.e. has a large impact on the result, which parameter needs to be set more precisely and for which parameter the value is almost irrelevant. Furthermore, we want to understand the dependencies between the parameters themselves. Finally, we want to compare the results of the two datasets.

For each dataset, we want to examine the entire dataset and the best 20% of the yval values to get a more detailed insight into the configurations of the best results. We will partition our data with the bounded range per parameter to obtain a subset of configurations with good yval values. We will also look at this constrained parameter range using PCPs.

We will use Importance Plots, Partial Dependence Plots (PDP), Heatmaps, and Parallel Coordinate Plots (PCP) to analyze the data. Importance plots provide the most important parameters. For a quick overview, we will use heatmaps. For a deeper insight into the boundary structure as well as for dependencies between 2 parameters we will then use Partial Dependence Plots (PDP). Only when the dataset has been reduced in size, we can also use Parallel Coordiante Plots (PCP) to get a good impression about parameter configurations. In addition, we will look at the data using Summaries to draw further conclusions.

Structure and Outline

This analysis is structured as follows, first the treated dataset is prepared, so that one can use it for analyses. Then, the analysis is performed and the results are used to suggest good configuration ranges for each parameter. The analyses and deeper insights into the analyses of each parameter, can be selected in the Table of Contents (TOC) on the left. Prior to this chapter, an overview of the dataset is provided. Finally, the results of the two datasets are compared.

Dataset: smashy_lcbench

Data Preparation

We need to load packages and subdivide the data to compare the whole dataset and the dataset with the 20% of configurations with the best result. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.

Load Data

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

All plots from the VisHyp package require an mlr3 task object as input. Therefore, a mlr3 task with the selected target is required.For lcbench, the target is yval, a logloss performance measure. Values near 0 indicate good performance.

Create Task

lcbenchTask <- TaskRegr$new(id = "task_lcbench", backend = lcbenchSmashy, target = "yval")

lcbenchBest <- lcbenchSmashy[lcbenchSmashy$yval >= quantile(lcbenchSmashy$yval, 0.8),]
bestTask <- TaskRegr$new(id = "bestTask", backend = lcbenchBest, target = "yval")

Results

The target parameter yval can reach values between -0.9647 and -0.4690. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.4690.

The most important parameter is sample. It should always be chosen “bohb” and not “random”, because 2130 of the best 2143 configurations were created with this factor and the average effect on yval is much larger when “bohb” is chosen.

The next very important parameter is the survival_rate. It can be seen that a low value is better on average, but high values can also be good for the best configurations. A value between 0.15 and 0.5 should be chosen for a high average performance without any further limitation. If a surrogate_Learner is selected, the constraint of the parameter should be chosen according to the selected surrogate_Learner.

Even though the surrogate_learner parameter is not that important, it influences most other parameters. This means that other parameter values should be set depending on the selected surrogate_learner if they have different effects on the performance measure. An indication that the surrogate_learner parameter has a large impact on the other parameters was given by the Importance Plot for the partial datasets split by surrogate_learner. This assigned different importance to the individual parameters, depending on the subset selected. This is especially noticeable for “bohb” samples. Parameters that should be selected depending on the chosen surrogate_learner are listed below. However, there are also findings of which surrogate_learner gives the best results: In the full dataset, surrogate_learner knn1 or knn7 showed the best performance and ranger the worst. For the top cases, we saw that many bohblrn and rangers were filtered out in disproportionate numbers. Surprisingly, bohblrn turned out to be the level of greatest importance.

knn1: survival_fraction should get a value above 0.5 if we are interested in the best cases. For the whole dataset, the best cases were on average below 0.5 random_interleave_fraction should be low and have a value between 0.05 and 0.5 according to the complete dataset. budget_log_step should be chosen between -0.5 and 0.5. filter_factor_first should get a value under 4. filter_select_per_tournament should get a value over 0.9.

knn7: filter_factor_first should be under 4. survival_fraction should be between 0.1 and 1 according to both, the full dataset and the subset. budget_log_step produces good performances for values between -0.5 and 1 but has not a big impact in general. random_interleave_fractionshould be between 0.25 and 0.75 according to the full dataset. In the subset it doesn’t matter. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5.

bohblrn: random_interleave_fraction better if lower. A good valuer should be between 0.05 and 0.65. survival_fraction lower is better in the full dataset but it doesnt matter for the best configurations budget_log_step it is hard to tell because of fluctuation but should be at least over -1.5. filter_algorithm should be “progressive”. filter_factor_last should be over 5. filter_factor_first should not be restricted.

ranger: random_interleave_fraction should be over 0.25. survival_fraction should be under 0.75. budget_log_step should be over -1.5.

Another important parameter for the general case is the random_interleave_fraction parameter. We have found that in general low values under 0.3 are better for “random” samples, and values between 0.1 and 0.75 are better for “bohb” samples. But this is only the case because it depends on surrogate_learner, and diser has many observations for levels knn1 and knn7. For these levels, a low value must be chosen to get a good result. For the “bohb” sample, values in the middle are better and for “ranger” high values achieve the best yval values. For the top cases, the parameter lost importance. This could be because the counter case with “random” samples are almost completely filtered out. The level factor did not change the behavior for the top case (for bobhlrn, the middle range is not so important anymore).

The second most important parameter for “bohb” sampling is the budget_log_step parameter. For the full dataset this parameter should be set between -0.5 and 1, but when choosing a surrogate_learner the parameter should be set according to this parameter.

filter_with_max_budget is not an important in general but should always be set to “TRUE” and is more important for “bohb” samples. Anyway, the effect is important for the surrgoate_learner “bohblrn” in top cases.

filter_factor_first is the most important parameter for the top 20%. It also has a higher importance in “random” samples than in “bohb” samples. In general it should be low (under 4) for “bohb” samples and high (near to 6) for “random” samples. The parameter filter_factor_first should not be restricted if the surrogate_learner is “bohblrn.”

filter_factor_last The effect is low and shouldn’t be used to subdivide the dataset in general.

filter_select_per_tournament shouldn’t be too high in general case but doesnt really matter for good results.

filter_algorithm and random_interleave_random have hardly any effect and can be left out for deeper investigations. Only for surrogate learner the factor “bohblrn” should be considered.

Data Constraint to Check the Results

To verify the proposed parameter configurations, we constrain the dataset and compare the obtained performance with the ranks of the performance of the whole dataset.

lcbenchEvaluation <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$surrogate_learner == "bohblrn",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$random_interleave_fraction > 0.05 & lcbenchEvaluation$random_interleave_fraction < 0.65,] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$budget_log_step > -1.5,]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_with_max_budget == "TRUE",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_algorithm == "progressive",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_factor_last > 5,]

lcbenchYval <- sort(lcbenchEvaluation$yval, decreasing = TRUE)
lcbenchYvalOriginal <- sort(lcbenchSmashy$yval, decreasing = TRUE)
sort(match(lcbenchYval, lcbenchYvalOriginal), decreasing = FALSE)
##  [1]    5    6   11   12   13   16   17   21   23   25   26   28   29   30   35
## [16]   37   44   47   53   55   57   64   75   82   84  112  117  129  139  178
## [31]  182  214  247 1153 2896 3181 3944 3961 5161 5997 6095 6635 6953 7318 7450
## [46] 7707 7930 8208 8212

We can see that many good results were obtained, but not nearly all of the best configurations were found out. This can be explained by the fact that we often imposed constraints to reduce the size of the dataset. For example, for some categorical parameters, we always chose one factor even though we knew that other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it was known that for some very good configurations, very good yval values can also be obtained outside the range.

Most interestingly, we get many good results, but also some seemingly bad ones. This could be due to hidden interactions that were not found, or inaccuracies in the constraints placed on the parameters by the visualization plots. In the second possibility, the poorer performance values could be due to errors in the interpretation of the plots. But also difficulties with the surrogate model could be decisive if predicted values of the performance values are not determined correctly. In addition, an inappropriate grid size in a PCP can lead to inaccuracies.

Finally some metrics are used to verify the results. The importance of the metrics can be found in the bachelor thesis.

summary(lcbenchEvaluation$yval)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.6003 -0.5262 -0.4798 -0.5026 -0.4748 -0.4713
#proportion
length(lcbenchEvaluation$yval)/length(lcbenchSmashy$yval)
## [1] 0.004574309
#top congifuration
sum(lcbenchYval >= quantile(lcbenchSmashy$yval, 0.95))/length(lcbenchYval)
## [1] 0.6734694
#quantile
sum(lcbenchSmashy$yval<=max(lcbenchYval))/length(lcbenchSmashy$yval)
## [1] 0.9996266

Visual Overview

With the implemented PCP our Results can be visually checked.

Limitation to Good Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Best_PCP.png")

Limitation to Bad Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Bad_PCP.png")

Overview

For visual analysis it is important to know the configuration spaces and the class of parameters.

Structure

str(lcbenchSmashy)
## 'data.frame':    10712 obs. of  12 variables:
##  $ budget_log_step             : num  0.1145 -0.4292 0.0482 0.8538 -1.4559 ...
##  $ survival_fraction           : num  0.261 0.3376 0.0149 0.7322 0.8552 ...
##  $ surrogate_learner           : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 1 3 2 1 2 3 3 ...
##  $ filter_with_max_budget      : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 2 1 1 1 2 1 ...
##  $ filter_factor_first         : num  0.234 3.756 1.002 0.437 0.672 ...
##  $ random_interleave_fraction  : num  0.225 0.104 0.542 0.489 0.516 ...
##  $ random_interleave_random    : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 1 1 1 1 1 2 1 ...
##  $ sample                      : Factor w/ 2 levels "bohb","random": 1 2 2 1 1 2 1 2 2 2 ...
##  $ filter_factor_last          : num  0.387 1.589 2.927 5.775 6.422 ...
##  $ filter_algorithm            : Factor w/ 2 levels "progressive",..: 1 1 1 1 2 1 1 1 2 2 ...
##  $ filter_select_per_tournament: num  2.27 2.3 1.93 1.52 0.51 ...
##  $ yval                        : num  -0.499 -0.535 -0.54 -0.475 -0.506 ...

We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).

Importance General

plotImportance(lcbenchTask)

Importance Best

plotImportance(bestTask)

For the general case, sample is the most important hyperparameter. The random_interleave_random parameter is of little importance. For the best configurations, filter_factor_first and filter_factor_last are the most important parameters and the sample parameter is no longer of importance. The ranking of the parameters has changed a lot, but the value of the importance measure has hardly changed for the parameters except for the sample parameter. We also look at a PCP:

plotParallelCoordinate(lcbenchTask)

It can be seen that there are too many observations to see much. The PCP makes more sense with fewer observations. After dividing the data, we first look for structural changes.

Summary All

summary(lcbenchSmashy)
##  budget_log_step   survival_fraction   surrogate_learner filter_with_max_budget
##  Min.   :-1.7528   Min.   :0.0000686   bohblrn:1372      FALSE:4801            
##  1st Qu.:-1.0795   1st Qu.:0.1877029   knn1   :3111      TRUE :5911            
##  Median :-0.4192   Median :0.3602689   knn7   :4803                            
##  Mean   :-0.3839   Mean   :0.4179906   ranger :1426                            
##  3rd Qu.: 0.3110   3rd Qu.:0.6339252                                           
##  Max.   : 1.0196   Max.   :0.9998031                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.000763    Min.   :0.0000227          FALSE:5008              
##  1st Qu.:2.791122    1st Qu.:0.1496729          TRUE :5704              
##  Median :4.452371    Median :0.3419693                                  
##  Mean   :4.139002    Mean   :0.3893602                                  
##  3rd Qu.:5.690380    3rd Qu.:0.6082803                                  
##  Max.   :6.907525    Max.   :0.9999744                                  
##     sample     filter_factor_last    filter_algorithm
##  bohb  :8763   Min.   :0.000763   progressive:3882   
##  random:1949   1st Qu.:2.462215   tournament :6830   
##                Median :4.267029                      
##                Mean   :3.960315                      
##                3rd Qu.:5.569787                      
##                Max.   :6.907578                      
##  filter_select_per_tournament      yval        
##  Min.   :0.001612             Min.   :-0.9647  
##  1st Qu.:1.000000             1st Qu.:-0.5923  
##  Median :1.000000             Median :-0.5377  
##  Mean   :1.086512             Mean   :-0.5646  
##  3rd Qu.:1.228722             3rd Qu.:-0.5189  
##  Max.   :2.397413             Max.   :-0.4690

Summary Best 20%

summary(lcbenchBest)
##  budget_log_step   survival_fraction  surrogate_learner filter_with_max_budget
##  Min.   :-1.7503   Min.   :0.000095   bohblrn: 130      FALSE: 731            
##  1st Qu.:-1.0406   1st Qu.:0.170492   knn1   : 796      TRUE :1412            
##  Median :-0.3780   Median :0.332510   knn7   :1161                            
##  Mean   :-0.3321   Mean   :0.381662   ranger :  56                            
##  3rd Qu.: 0.3890   3rd Qu.:0.523938                                           
##  Max.   : 1.0195   Max.   :0.999789                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.004248    Min.   :0.0000964          FALSE:1020              
##  1st Qu.:3.643269    1st Qu.:0.1208691          TRUE :1123              
##  Median :4.845318    Median :0.2392768                                  
##  Mean   :4.546724    Mean   :0.3170039                                  
##  3rd Qu.:5.870564    3rd Qu.:0.4727989                                  
##  Max.   :6.907525    Max.   :0.9979292                                  
##     sample     filter_factor_last    filter_algorithm
##  bohb  :2130   Min.   :0.004248   progressive: 798   
##  random:  13   1st Qu.:3.101750   tournament :1345   
##                Median :4.634717                      
##                Mean   :4.263191                      
##                3rd Qu.:5.721979                      
##                Max.   :6.907525                      
##  filter_select_per_tournament      yval        
##  Min.   :0.002426             Min.   :-0.5160  
##  1st Qu.:1.000000             1st Qu.:-0.5126  
##  Median :1.000000             Median :-0.5082  
##  Mean   :1.064477             Mean   :-0.5047  
##  3rd Qu.:1.101817             3rd Qu.:-0.4995  
##  Max.   :2.396205             Max.   :-0.4690

surrogate_learner: Many “bohblrn” and “rangers” were kicked out in disproportionate numbers. This could mean that these learner perform worse on average. filter_with_max_budget: In proportion more “FALSE” were filtered out. This could means that “TRUE” values perform better on average. We can see that only 13 rows of the the best 20% configurations have “random” sampling. The other (over 2100) instances have used “bohb” sampling. That is also the reason why the parameter sample has no importance for the subdivided dataframe since there are barely configurations samples with the factor “random” left.

The hyperparameter will be examined in following sections more precise.

Examination of the Parameters

sample

As we could notice, sample is the most important parameter in the full dataset. This parameter should have the right value to perform well. So let’s look at the effect of the variables in a PDP. We also check if the effect applies to all parameters. We can use a Heatmap to get a quick overview of the interactions. Values close to 1 have hardly any effect on the result.

PDP

plotPartialDependence(lcbenchTask, features = c("sample"), rug = FALSE, plotICE = FALSE)

Heatmap

subplot(
plotHeatmap(lcbenchTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)

PDP: It can be seen that the target values for “bohb” samples lead always to better results on average than for “random” samples.

Heatmaps: Note that survival_fraciton and random_interleave_fraction may give better results if a lower value is chosen for their parameter. Also, the surrogate_learner knn1 and knn7 seem to give better results. On average, the “bohb” sample is better, but let’s look at the best results and the combination of their instances.

We want to look at only the best configurations and verify that mostly “bohb” samples occur. Therefore we split the dataset into “bohb” and “random” samples.

random <- lcbenchSmashy[lcbenchSmashy$sample == "random",]
bohb <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",]

randomTask <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbTask <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")

We do split the entire dataset for the best configurations because we assume differences between “random” and “bohb” samples because many “random” were filtered out and the parameter lost a lot of importance. For these reasons, we split the dataset and focus primarily on the “bohb” sample in what follows. For the best 20% configurations we focus on “bohb” only.

Let’s check if there are differences in importance for the parameters in the “random” subset and the “bohb” subset.

Subset bohb

plotImportance(bohbTask)

Subset random

plotImportance(randomTask)

The hyperparameter survival_fraction is the most important parameter. Also random_interleave_fraction has high importance for both subsets. The parameters filter_algorithm and random_interleave_random do not seem to be important at all.

Bohb sample: The parameter budget_log_step is now more important. In the first plot, this parameter was not ranked that high. So we can assume that it is very important for this subset. The importance of the other parameters has not changed that much compared to the full data but the hyperparameter surrogate_learner and filter_with_max_budget are more important than for “random” samples.

Random sample: It looks like the right parameter configuration is more important in the “bohb” sample because The parameter importance values are in general higher than in the “bohb” sample. The parameters filter_factor_last and filter_factor_first have a higher importance in the “random” sample.

Top 20%

We could see in the beginning that most of the good results were gained with “bohb” samples. That’s why we will focus on “bohb” samples only from now on. That is, we remove the 13 rows of “random” samples from the underlying data.

bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")

survival_fraction

The survival_fraction parameter is the most important parameter for both samples of the entire dataset. With a PDP, we can gain better insight into how the parameter should be configured.

Subset bohb

plotPartialDependence(bohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE) 

Subset random

plotPartialDependence(randomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

In general, lower values achieve better performance than higher values. For the “bohb” susbet, the best range seems to be between 0.15 and 0.6. This means that too low a value is not so good in this case. For the “random” subset it is almost monotonically decreasing, which means that lower values are always better.

Top 20%

A possibility to find reasons for the structure is to filter the dataset again. For this we can split the data according to the best 20% yval values of the “bohb” samples

plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE, gridsize = 20)

In this case, higher values seem to be somewhat better. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. However, the difference on the y-axis is only small and therefore it cannot be said that high values are better.

surrogate_learner

Another important parameter for “bohb” subset is the surrogate_learner.

plotPartialDependence(bohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

In this graphic, knn1 and knn7 seem to be the best choices based on the results so far. For a more detailed analysis, we should divide the data into the individual surrogate_learners again and check if there are difference in the importance of the remaining parameters.

knn1 <- bohb[bohb$surrogate_learner == "knn1",] 
knn7 <- bohb[bohb$surrogate_learner == "knn7",] 
bohblrn <- bohb[bohb$surrogate_learner == "bohblrn",]
ranger <- bohb[bohb$surrogate_learner == "ranger",]

knn1Task <- TaskRegr$new(id = "knn1Task", backend = knn1, target = "yval")
knn7Task <- TaskRegr$new(id = "knn7Task", backend = knn7, target = "yval")
bohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = bohblrn, target = "yval")
rangerTask <- TaskRegr$new(id = "rangerTask", backend = ranger, target = "yval")

Subset: knn1

plotImportance(knn1Task)

Subset: knn7

plotImportance(knn7Task)

Subset: bohblrn

plotImportance(bohblrnTask)

Subset: ranger

plotImportance(rangerTask)

The parameter survival_fraction is very important for the “bohblrn” and “knn1” subset. This could already be seen in the PDP for survival_fraction. The hyperparameter random_interleave_fraction has high importance for all surrogate_learners. For the factor “knn7” the parameter budget_log_step seems to be more important than for other factors of the surrogate_learner parameter. To check why the importance differs and whether the parameters have different good ranges, let’s take a closer look at 3 very important parameters. We use ICE curves here to gain further insight. Later we check each factor separately for the top 20% of the configuration to find differences.

knn1: random_interleave_fraction

plotPartialDependence(knn1Task, "random_interleave_fraction", plotICE = FALSE)

knn7: random_interleave_fraction

plotPartialDependence(knn7Task, "random_interleave_fraction", plotICE = FALSE)

bohblrn: random_interleave_fraction

plotPartialDependence(bohblrnTask, "random_interleave_fraction", plotICE = FALSE)

ranger: random_interleave_fraction

plotPartialDependence(rangerTask, "random_interleave_fraction", plotICE = FALSE)

For “knn1”, lower random_interleave_fraction values seem to be better. For “knn7” and “bohblrn”, the random_interleave_fraction values should be neither too high nor too low, and for “ranger”, higher values lead to better yval results. A good range for “bohblrn” seems to be between 0.05 and 0.65. For knn1 a value between 0.05 and 0.5 seems good. A good range for “knn7” seems to be between 0.25 and 0.75

knn1: survival_fraction

plotPartialDependence(knn1Task, "survival_fraction", plotICE = FALSE)

knn7: survival_fraction

plotPartialDependence(knn7Task, "survival_fraction", plotICE = FALSE)

bohblrn: survival_fraction

plotPartialDependence(bohblrnTask, "survival_fraction", plotICE = FALSE)

ranger: survival_fraction

plotPartialDependence(rangerTask, "survival_fraction", plotICE = FALSE)

Low value for survival_fraction are better in general and could be set to under 0.5 but high values are worst for the “boblrn”. For the surrogate_learner “knn7” values around 0.5 seems to produce best performanes, for the factor “knn1” a good choice is between 0.1 and 0.6. For for all other factors values under 0.5 are better.

knn1: budget_log_step

plotPartialDependence(knn1Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

knn7: budget_log_step

plotPartialDependence(knn7Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

bohblrn: budget_log_step

plotPartialDependence(bohblrnTask, "budget_log_step", plotICE = FALSE)

ranger: budget_log_step

plotPartialDependence(rangerTask, "budget_log_step", plotICE = FALSE)

It is very interesting that the line for the parameter budget_log_step shows repeated dips. It is only for the factors “knn7” and “knn1”. The range is hard to identify since it also depends on the gridsize of the plot. It can be said that a value over -0.5 is a good choice knn7 and “ranger.” For “bohb” there are repeated dips but a value should be over -0.5. For “knn1” and “knn7” values bewteen -0.5 and 1 seems to achieve good results.

Top 20%

We also want to invest the best cases and for this directly check the subdivided datasets. For this we will search and analyze the most important parameters with the Importance Plot. In addition, we will examine abnormalities in the PCP in more detail and also look on some summaries.

plotPartialDependence(bestTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

the factor “bohblrn” of the surrogate_learner parameter is now most important, and the factor “ranger” is cleary more important now.

surrogate_learner bohblrn

Lets investigate the surprising outcome of surrogate_learner class “bohblrn”

bohblrnBest <- bohbBest[bohbBest$surrogate_learner == "bohblrn",]

bohblrnTaskBest <- TaskRegr$new(id = "bohblrnTask", backend = bohblrnBest, target = "yval")

PCP bohblrn

plotParallelCoordinate(bohblrnTaskBest, labelangle = 10)

Importance Plot bohblrn

plotImportance(bohblrnTaskBest)

PCP: A high value for the filter_factor_last parameter could be better since there a lot of lines + reach high yval values. The filter_with_max_budget parameter should be set to “TRUE” and the parameter filter_algorithm should be set to “progressive”. It looks like high budget_log_step achieve best results. The parameter filter_factor_first should be restricted.

Importance Plot: In the genereal case for bohblrn survival_fraction was most important (by far!), now it is budget_log_step and filter_with_max_budget.

Let’s investigate why the survival_fraction parameter lost in importance.

bohblrn: full Dataset

plotPartialDependence(bohblrnTaskBest, "survival_fraction")

bohblrn: subdivided Dataset

plotPartialDependence(bohblrnTask, "survival_fraction")

Before a high survival_fraction led to a drop, but one can see that it doesn’t effect very good results! Here we can see why as an addition to the PDP, ICE Curves can be useful as well.

Let us observe the other impotant parameter from PCP and Importance Plot for the “bohblrn” of surrogate_learner.

bohblrn: PDP budget_log_step

plotPartialDependence(bohblrnTaskBest, "budget_log_step", gridsize = 30, plotICE = FALSE)

bohblrn: PDP filter_with_max_budget

plotPartialDependence(bohblrnTaskBest, "filter_with_max_budget")

bohblrn: PDP filter_factor_last

plotPartialDependence(bohblrnTaskBest, "filter_factor_last", plotICE = FALSE)

bohblrn: PDP filter_algorithm

plotPartialDependence(bohblrnTaskBest, "filter_algorithm")
summary(bohblrnBest$filter_algorithm)
## progressive  tournament 
##          63          54
summary(bohblrn$filter_algorithm)
## progressive  tournament 
##         278         590

In general budget_log_step perform better with higher values but worse prediction do barely increase with higher configuration values. There are also little drops around -0.3 to 0.5.

Filter_with_max_budget should be set to “TRUE”. There are more observations left than in the subset with factor “FALSE”. In proportion, more “FALSE” have already been thrown out and therefore this is another indication that “TRUE” is the choice for better yval.

The Parameter filter_factor_last high values could perform results best because even the the differences are low there are more observations than on other ranges. A good choice for a configuration is over 5.

The thesis that filter_algorithm should be “progressive” can be confirmed. The Partial Dependence Plot doesnt show it but a lot of tournament got filtered out.

surrogate_learner knn1

Lets investigate the surprising outcome of surrogate_learner class bohblrn

knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]

knn1BestTask <- TaskRegr$new(id = "bohblrnBestTask", backend = knn1Best, target = "yval")

PCP knn1

plotParallelCoordinate(knn1BestTask, labelangle = 10)

Importance Plot knn1

plotImportance(knn1BestTask)

PCP: The parameter filter_with_max_budget should set to “TRUE”. It looks like there a specific ranges for budget_log_step which brings better results. The hyperparameter survival_fraction should be high and the parameter. random_interleave_fraction should be low for good results. High filter_factor_last values could be better since there a lot of lines + results in high yval values. The parameter filter_select_per_tournament should be set to 1.

Importance Plot: The paramter filter_factor_first and survival_fraction and filter_factor_last. are most important according to Importance Plot.

The interesting parameter according to PCP and Importance Plots should be examined.

knn1: PDP filter_factor_first

plotPartialDependence(knn1BestTask, "filter_factor_first", plotICE = FALSE )

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "survival_fraction", plotICE = FALSE)

knn1: PDP filter_factor_last

plotPartialDependence(knn1BestTask, "filter_factor_last", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_with_max_budget")

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "budget_log_step", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_select_per_tournament", plotICE = FALSE)

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "random_interleave_fraction", plotICE = FALSE)

In general the parameter filter_factor_first seems to produce better results in low ranges but best results are in configuration ranges under 4. The variable survival_fraction should get a vlue over 0.5 (interesting because in the general case lowe values were better!). The hyperparameter filter_factor_last and random_interleave_fraction doesn’t really tell us where the best configurations are.

surrogate_learner knn7

knn7Best <- bohbBest[bohbBest$surrogate_learner == "knn7",]

knn7BestTaskBest <- TaskRegr$new(id = "knn7Task", backend = knn7Best, target = "yval")

PCP knn7

plotParallelCoordinate(knn7BestTaskBest, labelangle = 10)

Importance Plot knn7

plotImportance(knn7BestTaskBest)

PCP: filter_algorithm should be “tournament”. filter_factor_first should be around 4. random_interleave_random should be “FALSE”. survival_fraction seems to be a low. The parameter filter_with_max_budget should be set to “TRUE”. The hyperparameter random_interleave_fraction should get a low value and the parameter filter_select_per_tournament should get a value around 1.

Importance Plot: The most important parameters are filter_factor_first, filter_factor_last and budget_log_step.

knn7: PDP filter_factor_first

plotPartialDependence(knn7BestTaskBest, "filter_factor_first", plotICE = FALSE )

knn7: PDP filter_factor_last

plotPartialDependence(knn7BestTaskBest, "filter_factor_last", plotICE = FALSE)

knn7: PDP budget_log_step

plotPartialDependence(knn7BestTaskBest, "budget_log_step", plotICE = FALSE)

knn7: PDP filter_algorithm

plotPartialDependence(knn7BestTaskBest, "filter_algorithm", plotICE = FALSE)

knn7: PDP random_interleave_random

plotPartialDependence(knn7BestTaskBest, "random_interleave_random")

knn7: PDP survival_fraction

plotPartialDependence(knn7BestTaskBest, "survival_fraction", plotICE = FALSE)

knn7: PDP random_interleave_fraction

plotPartialDependence(knn7BestTaskBest, "random_interleave_fraction", plotICE = FALSE)

knn7: PDP filter_select_per_tournament

plotPartialDependence(knn7BestTaskBest, "filter_select_per_tournament", plotICE = FALSE)

knn7: PDP filter_with_max_budget

plotPartialDependence(knn7BestTaskBest, "filter_with_max_budget")

The Parameter filter_factor_first should be under 4, budget_log_step produces best values over 0.5 but has not a big impact in general. Again, we don’t see the perfect range for filter_factor_last and random_interleave_fraction. And we can not confirm with certainty that “tournament” are always better. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5. filter_with_max_budget should be “TRUE”.

surrogate_learner ranger

Finally, the ranger should be investigated since the average performance for good configurations increased a lot.

rangerBest <- bohbBest[bohbBest$surrogate_learner == "ranger",]

rangerBestTaskBest <- TaskRegr$new(id = "rangerBestTask", backend = rangerBest, target = "yval")

PCP ranger

plotParallelCoordinate(rangerBestTaskBest, labelangle = 10)

Importance Plot ranger

plotImportance(rangerBestTaskBest)

PCP: budget_log_step should be high. filter_with_max_budget should be “TRUE”.

Importance Plot: The most important parameters are filter_factor_first, filter_with_max_budget and budget_log_step.

ranger: PDP survival_fraction

plotPartialDependence(rangerBestTaskBest, "filter_factor_first", plotICE = FALSE)

ranger: PDP budget_log_step

plotPartialDependence(rangerBestTaskBest, "budget_log_step", plotICE = FALSE)

ranger: PDP filter_with_max_budget

plotPartialDependence(rangerBestTaskBest, "filter_with_max_budget", plotICE = FALSE)

A high budget_log_step and a low filter_factor_first seems produce best performance. For budget_log_step a value over -0.5 seems to be good, for filter_factor_first a value under 2.5 performs best. It needs to be noticed that only around 45 observations are left and so the intepretation is not that clear. The parameter filter_with_max_budget should set to “TRUE”.

budget_log_step

Another important parameter for the “bohb” samples is the budget_log_step parameter. Let’s have a look on the PDP.

PDP full Data

plotPartialDependence(bohbTask,"budget_log_step", plotICE = FALSE)

subdivided dataset

plotPartialDependence(bohbBestTask, features = c("budget_log_step"), plotICE = FALSE)

In General the value for budget_log_step should be over -0.5. A high value seems a good choice in the subdivided dataset. However, we could also see before that the parameter varies greatly for the surrogate_learner “knn1” and “knn7” and therefore the parameter is assigned a high importance without it being clear how best to set the parameter.

random_interleave_fraction

Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in the “bohb” sample and in the “random” sample. Slighty more important in “random” sample. Let check this parameter.

bohb Subset

plotPartialDependence(bohbTask, features = c("random_interleave_fraction"), plotICE = FALSE)

random Subset

plotPartialDependence(randomTask, features = c("random_interleave_fraction"), plotICE = FALSE)

For the random_interleave_fraction and the “bohb” sample a good choice is a value which is not too high or too low since they give worst performances. a good value seems to be between 0.1 and 0.7 . For the “random” sample low values bring better performances here.

top 20%

plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), plotICE = FALSE)

In the upper case, there is no bad range at the edges.

filter_factor_last

The parameter filter_factor_last was less important but a little check is good as well.

full dataset

plotPartialDependence(bohbTask, "filter_factor_last", plotICE = FALSE)

subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), plotICE = FALSE)

The effect is low and should be only chosen according to the surrogate_learner.

filter_with_max_budget

full Dataset

plotPartialDependence(bohbTask, features = c("filter_with_max_budget"), rug = FALSE)

subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_with_max_budget"), rug = FALSE)

The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.

filter_select_per_tournament

This parameter filter_select_per_tournament had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.

PDP filter_select_per_tournament

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament"), plotICE = FALSE)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_first"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_last"), rug = FALSE, gridsize = 10)

The effect is weak and maybe comes from the peaks around 1. The parameter value should be probably choosen between 1 or slightly better but the effect shouldn’t effect much.

filter_factor_first

The parameter filter_factor_first was a very high ranked parameter in the parmameter Importance Plot for top configurations.

PDP filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_factor_first"), gridsize = 20, plotICE = FALSE)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "filter_factor_last"), rug = FALSE, gridsize = 10)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with budget_log_step

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "budget_log_step"), rug = FALSE, gridsize = 10)

In general lower values for filter_factor_first achieve slightly better performance. But the differences are small and should not lead to a change in considerations made.

Dataset: smashy_super

For the dataset smashy_super the target is yval, which is a logloss performance measurement. Values close to 0 mean good performance. First, of all we want to know which parameter is important in general.

Data Preparation

We need to subset the data to compare the whole dataset and the dataset with the dataset containing the 20% of configurations with the best outcome. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.

Load Data

superSmashy <- readRDS("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/package_VisHyp/data-raw/smashy_super.rds")
superSmashy <- as.data.frame(superSmashy)

n <- length(superSmashy)
for (i in 1:n) {
  if(is.logical(superSmashy[,i]))
    superSmashy[,i] <- as.factor(superSmashy[,i])
  if(is.character(superSmashy[,i]))
    superSmashy[,i] <- as.factor(superSmashy[,i])
}

Create Task

superTask <- TaskRegr$new(id = "superSmashy", backend = superSmashy, target = "yval")
superBest <- superSmashy[superSmashy$yval >= quantile(superSmashy$yval, 0.8),]
superBestTask <- TaskRegr$new(id = "bestTask", backend = superBest, target = "yval")

Results

The target parameter yval can reach values between -0.3732 and -0.2105. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.2105.

The parameter sample perform better on average with the factor “random” than with the “bohb” factor. For the top 20% configurations, many “bohb” samples have been sorted out, but the remaining ones have on average a better performance than the “random” samples. In the end, both samples can lead to good performance values but since a lot of the remaining samples are “random” we will choose this factor.

In general, for the parameter survival_fraction lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. For the top configurations, higher values do not seem to be worse so that with matching configurations of other paraemter the value of this parameter can be also high. Although not all high values have poor performance, lower values seem to be the right choice since most good configurations have lower values. A value between 0.05 and 0.30 seems to be a good choice for the factor “knn1” of the surrogate_learner parameter.

The surrogate_learner parameter is one of the most important parameters for the whole dataset. After reducing the dataset to the best 20% of configurations, we could see that the parameter lost importance, since the best surrogate_learner had mainly the “knn1” factor. Even though we found that for all other Surrogate_learner the best configuration could achieve better yval values than with the “knn1” factor, it makes sense to choose knn1 because of better results on average.

The most important parameter for the best 20% of the configurations was the random_interleave_fraction parameter. In this case, the results were unambiguous, so higher values led to better results for both the full dataset and the subset. Another early indicator in the analysis was the summary of the full and the divided dataset. It could be seen that the summary indices for the subset were all higher. All effect tools such as the PDP, PCP, and Heatmap also showed these results. For our purpose, we only take values above 0.5, which is about half.

A similar problem as earlier with the surrogate_learner occurs with budget_log_step. In the full dataset, higher values are better, but in the top 20% of configurations, lower values achieve better yval values. But unlike surrogate_learner, there are more configurations with good results in the split dataset. Also, it is a very important parameter for the top 20% configurations, so it shouldn’t be neglected that good performance values can be achieved with lower budget_log_step values. In this case it is better not to limit the parameter.

In the best parameter configurations in combination with “knn1” values of the surrogate_learner parameter, the filter_factor_first parameter was the most important parameter. In the full dataset, this parameter was not important at all. There is also a difference in the range of good configurations. In the full dataset, values above 6 did not perform well, while in the subdivided dataset, values above 6 produced the best results. Even after subdividing into the best 20% of configurations, the majority of good values were above 4, so it can be said that values above 4 seem to be a good choice for this parameter.

A little more complicated was the interpretation of filter_factor_last. Filter_factor_last has large fluctuations and different good ranges depending on whether we look at the full or partial dataset. Moreover, we can say that although the importance is high due to the large fluctuations, the range of predicted performances is not very large (which actually refutes the importance). In general, however, one can say that the parameter value for Filter_factor_last should be between 1.5 and 2.5, or above 5.5. Or at least not between 4 and 5.

A really good parameter to interpret is filter_with_max_budget. This parameter is not really important in the full dataset, but for the best configurations in combination with “knn1” one can say that “TRUE” should be the choice.

filter_algorithm, filter_select_per_tournament and random_interleave_random have barely an effect and therefore do not need to be limited.

Data Constraint to Check the Results

To verify the proposed parameter configurations, we constrain the dataset and compare the obtained performance with the ranks of the performance of the whole dataset.

superEvaluation <- superSmashy[superSmashy$sample == "random",] 
superEvaluation <- superEvaluation[superEvaluation$survival_fraction > 0.05 & superEvaluation$survival_fraction < 0.3,] 
superEvaluation <- superEvaluation[superEvaluation$surrogate_learner == "knn1",] 
superEvaluation <- superEvaluation[superEvaluation$random_interleave_fraction > 0.5,]
superEvaluation <- superEvaluation[superEvaluation$filter_factor_first > 4,]
superEvaluation <- superEvaluation[superEvaluation$filter_factor_last < 4 | superEvaluation$filter_factor_last > 5,]
superEvaluation <- superEvaluation[superEvaluation$filter_with_max_budget == "TRUE",]

superYval <- sort(superEvaluation$yval, decreasing = TRUE)
superYvalOriginal <- sort(superSmashy$yval, decreasing = TRUE)
sort(match(superYval, superYvalOriginal), decreasing = FALSE)
##  [1]   20   40   49   58   62   69   79  107  112  115  116  130  152  161  162
## [16]  178  182  184  189  206  208  218  238  241  242  264  274  276  277  280
## [31]  295  296  300  305  318  319  331  332  336  340  356  377  378  382  388
## [46]  393  404  432  434  442  446  450  452  486  489  490  501  509  513  534
## [61]  539  547  550  568  578  602  604  605  621  626  632  637  661  664  682
## [76]  722  731  737  744  754  774  794  818  823  838  839  853  920  922  971
## [91]  987  996 1152 1292 1468 1470

We can see that many good results were obtained, but not nearly all of the best configurations were found out. This can be explained by the fact that we often imposed constraints to reduce the size of the dataset. For example, for some categorical parameters, we always chose one factor even though we knew that other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it was known that for some very good configurations, very good yval values can also be obtained outside the range. In the end, however, we were able to show that the ranges we restricted lead to almost exclusively above-average or good performance values. Finally, the metrics are calculated again. The importance of the metrics can be found in the bachelor thesis.

#summary
summary(superSmashy$yval)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.3732 -0.2390 -0.2331 -0.2347 -0.2278 -0.2105
#proportion
length(superYval)/length(superSmashy$yval)
## [1] 0.03374341
#top congifuration
sum(superYval >= quantile(superSmashy$yval, 0.95))/length(superYval)
## [1] 0.125
sum(superYval >= quantile(superSmashy$yval, 0.8))/length(superYval)
## [1] 0.6666667
#quantile
sum(superSmashy$yval<=max(superYval))/length(superSmashy$yval)
## [1] 0.9933216

Visual Overview

With the implemented PCP it can be visually checked. This can be checked visually with the implemented PCP. For a better overview, the color range is somewhat restricted, since there are very few observations below -0.3. For a better comparison, the presumed good range and the presumed worse configuration range of the parameters are shown once.

Limitation to very good configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/Super_Best_PCP.png")

Limitation to bad configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/Super_Bad_PCP.png")

Overview

An overview is obtained again.

Head

head(superSmashy)
##   budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## 1      0.11449875        0.26100298              knn7                  FALSE
## 2     -0.42921649        0.33760502              knn7                   TRUE
## 3      0.04823162        0.01486055              knn7                   TRUE
## 4     -1.44318828        0.57712483              knn7                   TRUE
## 5      0.37983696        0.16755070           bohblrn                  FALSE
## 6      0.11449875        0.85519272              knn7                  FALSE
##   filter_factor_first random_interleave_fraction random_interleave_random
## 1         0.233780263                  0.2254148                     TRUE
## 2         3.756367542                  0.1042924                     TRUE
## 3         1.002387921                  0.5424223                    FALSE
## 4         6.404499751                  0.6294822                     TRUE
## 5         0.004248442                  0.7319387                     TRUE
## 6         5.105712766                  0.6763331                     TRUE
##   sample filter_factor_last filter_algorithm filter_select_per_tournament
## 1   bohb          0.3870927      progressive                    2.2749194
## 2 random          1.5890745      progressive                    2.2996638
## 3 random          2.9274948      progressive                    1.9313954
## 4   bohb          1.8534344       tournament                    1.7707135
## 5 random          4.0016987       tournament                    2.2842471
## 6   bohb          3.8174711       tournament                    0.4610276
##         yval
## 1 -0.2205114
## 2 -0.2158789
## 3 -0.2123531
## 4 -0.2121151
## 5 -0.2117795
## 6 -0.2186847

Structure

str(superSmashy)
## 'data.frame':    2845 obs. of  12 variables:
##  $ budget_log_step             : num  0.1145 -0.4292 0.0482 -1.4432 0.3798 ...
##  $ survival_fraction           : num  0.261 0.3376 0.0149 0.5771 0.1676 ...
##  $ surrogate_learner           : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 3 1 3 3 3 4 4 ...
##  $ filter_with_max_budget      : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 1 1 2 2 1 2 ...
##  $ filter_factor_first         : num  0.23378 3.75637 1.00239 6.4045 0.00425 ...
##  $ random_interleave_fraction  : num  0.225 0.104 0.542 0.629 0.732 ...
##  $ random_interleave_random    : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 2 2 2 1 1 2 1 ...
##  $ sample                      : Factor w/ 2 levels "bohb","random": 1 2 2 1 2 1 1 2 2 2 ...
##  $ filter_factor_last          : num  0.387 1.589 2.927 1.853 4.002 ...
##  $ filter_algorithm            : Factor w/ 2 levels "progressive",..: 1 1 1 2 2 2 1 1 2 2 ...
##  $ filter_select_per_tournament: num  2.27 2.3 1.93 1.77 2.28 ...
##  $ yval                        : num  -0.221 -0.216 -0.212 -0.212 -0.212 ...

We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).

Importance General

plotImportance(task = superTask)

Importance Best

plotImportance(task = superBestTask)

For the full dataset, surrogate_learner is the most and sample the second most important hyperparameter. After filtering the dataset, both parameters lose much of their importance and have little effect, so random_interleave_fraction becomes the most important parameter. Parameters like filter_algorithm, random_interleave_random and filter_with_max_budget have no effect on the full dataset nor on the filtered dataset.

After we have subdivided the data, we also want to look for structural changes in the summary.

Summary All

summary(superSmashy)
##  budget_log_step   survival_fraction   surrogate_learner filter_with_max_budget
##  Min.   :-1.7509   Min.   :0.0001849   bohblrn: 374      FALSE:1119            
##  1st Qu.:-0.8770   1st Qu.:0.1864801   knn1   :1658      TRUE :1726            
##  Median :-0.0860   Median :0.3550278   knn7   : 478                            
##  Mean   :-0.2054   Mean   :0.4194451   ranger : 335                            
##  3rd Qu.: 0.4727   3rd Qu.:0.6533882                                           
##  Max.   : 1.0186   Max.   :0.9999182                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.004248    Min.   :0.000615           FALSE:1624              
##  1st Qu.:2.454531    1st Qu.:0.308627           TRUE :1221              
##  Median :4.393864    Median :0.545574                                   
##  Mean   :4.066960    Mean   :0.536262                                   
##  3rd Qu.:5.794467    3rd Qu.:0.774285                                   
##  Max.   :6.906027    Max.   :0.999015                                   
##     sample     filter_factor_last    filter_algorithm
##  bohb  :1226   Min.   :0.004248   progressive: 909   
##  random:1619   1st Qu.:2.268931   tournament :1936   
##                Median :4.183293                      
##                Mean   :3.911979                      
##                3rd Qu.:5.670457                      
##                Max.   :6.906027                      
##  filter_select_per_tournament      yval        
##  Min.   :0.0009299            Min.   :-0.3732  
##  1st Qu.:1.0000000            1st Qu.:-0.2390  
##  Median :1.0000000            Median :-0.2331  
##  Mean   :1.0740216            Mean   :-0.2347  
##  3rd Qu.:1.0869452            3rd Qu.:-0.2278  
##  Max.   :2.3956034            Max.   :-0.2105

Summary Best 20%

summary(superBest)
##  budget_log_step    survival_fraction  surrogate_learner filter_with_max_budget
##  Min.   :-1.74596   Min.   :0.000291   bohblrn:  2       FALSE:127             
##  1st Qu.:-0.46235   1st Qu.:0.121852   knn1   :546       TRUE :442             
##  Median : 0.25398   Median :0.256286   knn7   : 19                             
##  Mean   : 0.04121   Mean   :0.320271   ranger :  2                             
##  3rd Qu.: 0.61932   3rd Qu.:0.433896                                           
##  Max.   : 1.01297   Max.   :0.992048                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.004248    Min.   :0.02443            FALSE:337               
##  1st Qu.:3.697472    1st Qu.:0.43278            TRUE :232               
##  Median :5.308223    Median :0.63116                                    
##  Mean   :4.710573    Mean   :0.61323                                    
##  3rd Qu.:6.174077    3rd Qu.:0.82455                                    
##  Max.   :6.899001    Max.   :0.98931                                    
##     sample    filter_factor_last    filter_algorithm
##  bohb  :156   Min.   :0.1005     progressive:202    
##  random:413   1st Qu.:2.7705     tournament :367    
##               Median :4.8008                        
##               Mean   :4.3414                        
##               3rd Qu.:6.0197                        
##               Max.   :6.8990                        
##  filter_select_per_tournament      yval        
##  Min.   :0.001125             Min.   :-0.2270  
##  1st Qu.:1.000000             1st Qu.:-0.2261  
##  Median :1.000000             Median :-0.2249  
##  Mean   :1.055841             Mean   :-0.2244  
##  3rd Qu.:1.000000             3rd Qu.:-0.2234  
##  Max.   :2.381424             Max.   :-0.2105

These summary already explains why the parameter surrogate_learner lost most of its importance. Many “bohblrn”, “knn7” and “rangers” were kicked out. This could mean that these learner perform worse on average than the “knn1” learner. For the parameter filter_with_max_budget many configurations with “FALSE” were filtered out in disproportionate numbers. This could means that “TRUE” values perform better on average. It is also noted that the summary values of survival_fraction have decreased and increased for budget_log_step , Filter_factor_first and random_interleave_fraction. Finally, a disproportionate number of “bohb” samples also dropped out of the dataset. Perhaps this is an indication that “random” samples gave better results.

The hyperparameter will be examined in following sections more precise.

Examination of the parameters

sample

As we could find out, sample is again an important parameter in the full dataset and can take the values “bohb” or “random”. This parameter should have the right value for good performance. Therefore, let us consider the effects of the parameter in a Partial Dependence Plot. We also check if the effect applies to all parameters. We can use a Heatmap to get a quick overview of interactions. Values close to 1 have barealy an effect on the outcome.

PDP

plotPartialDependence(superTask, features = c("sample"), rug = FALSE, plotICE = FALSE)

Heatmap

subplot(
plotHeatmap(superTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)

In the PDP, it can be seen that the target values for “random” samples lead to better results on average than for “bohb” samples. In the heatmaps, it can be seen that the predicted performances may be better when filter_with_max_budget is set to “TRUE”, random_interleave_fraction is given a high value and survival_fraction is given a low value. As suspected since the summary, the factor “knn1” of the surrogate_learner parameter give best results on average.

Top 20%

we can split the data according to the best 20% yval values of the dataset and check if the outcome of a PDP is different.

plotPartialDependence(superBestTask, features = c("sample"), rug = TRUE, plotICE = TRUE)

A lot of “bohb” samples were sorted out, but the remaining ones have on average a better performance than the “random” samples. Since both subsets seem important for further analysis, we split the entire dataset. Furthermore, we assume differences between “random” and “bohb” samples, since the parameter has lost much of its importance after filtering. Therefore we split the dataset into “bohb” and “random” samples.

superRandom <- superSmashy[superSmashy$sample == "random",]
superBohb <- superSmashy[superSmashy$sample == "bohb",]

superRandomTask <- TaskRegr$new(id = "task_superRandom", backend = superRandom, target = "yval")
superBohbTask <- TaskRegr$new(id = "task_superBohb", backend = superBohb, target = "yval")

Let’s check if there are differences in importance for the parameters in the “random” subset and the “bohb” subset.

Subset bohb

plotImportance(task = superBohbTask)

Subset random

plotImportance(task = superRandomTask)

The hyperparameter surrogate_learner and random_interleave_fraction are still the most important parameter for both constrained datasets. In fact, the importance didn’t change a lot.

There is little difference between the two factors of the sample parameter in the full dataset. We did find that the majority of the good results were obtained with the “random” samples, but for further analysis we will look at both the “random” subset and the “bohb” subset.

survival_fraction

The survival rate parameter was a moderately important parameter for both samples of the entire dataset, but we assumed based on the summary that low values may lead to better performance. This parameter can take values between 0.00007 and 0.9998. Let us explore this assumption with a PDP.

Subset bohb

plotPartialDependence(superBohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE) 

Subset random

plotPartialDependence(superRandomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

In general, lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. This means that the value should probably be low, but not minimal. For both subsets, the best range seems to be between 0.05 and 0.25. While the “random” samples are almost monotonly decreasing the “bohb” samples has another peak between 0.5 and 0.75.

Top 20%

A possibility to analyze the structure is to filter the dataset again. For this we can split the data according to the best 20% yval values of the “bohb” samples. We can review “bohb” samples with ICE-Curves. ICE-Curvers can show the heterogeneous relationship between the parameter survival_fraction and the performance parameter yval created by interactions.

superBohbBest <- superBohb[superBohb$yval >= quantile(superBohb$yval, 0.8),]
superBohbBestTask <- TaskRegr$new(id = "superBohbBestTask", backend = superBohbBest, target = "yval")

superRandomBest <- superBohb[superBohb$yval >= quantile(superBohb$yval, 0.8),]
superRandomBestTask <- TaskRegr$new(id = "superRandomBestTask", backend = superBohbBest, target = "yval")

bohb Best

plotPartialDependence(superBohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

random Best

plotPartialDependence(superRandomBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

In this case, higher values do not seem to be worse. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75 for the “bohb” sample. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. Although not all high values have poor performance, lower values seem to be the right choice since most good configurations have lower values.

surrogate_learner

A very important parameter for the “bohb” subset was the surrogate_learner. We can already assume that “knn1” is the most important surrogate_learner, since many other surrogate_learner were filtered out in the top 20% dataset. But let’s check this with a PDP.

Subset bohb

plotPartialDependence(superBohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

Subset bohb

plotPartialDependence(superRandomTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

In both subsets, “knn1” is actually the best choice based on the PDP. There does not seem to be much difference in the other parameters. For a more detailed analysis, we should split the data into the individual surrogate learners and see if there are differences in the importance of the other parameters. Although it would be interesting to analyze the learners for both samples separately, we focus on the whole dataset to make it less complicated and because the importance of the subsets does not differ much.

superKnn1 <- superSmashy[superSmashy$surrogate_learner == "knn1",] 
superKnn7 <- superSmashy[superSmashy$surrogate_learner == "knn7",] 
superBohblrn <- superSmashy[superSmashy$surrogate_learner == "bohblrn",]
superRanger <- superSmashy[superSmashy$surrogate_learner == "ranger",]

superKnn1Task <- TaskRegr$new(id = "knn1Task", backend = superKnn1, target = "yval")
superKnn7Task <- TaskRegr$new(id = "knn7task", backend = superKnn7, target = "yval")
superBohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = superBohblrn, target = "yval")
superRangerTask <- TaskRegr$new(id = "rabgerTask", backend = superRanger, target = "yval")

Subset: knn1

plotImportance(superKnn1Task)

Subset: knn7

plotImportance(superKnn7Task)

Subset: bohblrn

plotImportance(superBohblrnTask)

Subset: ranger

plotImportance(superRangerTask)

The parameter sample, random_interleave_fraction are most important for “knn1”, “knn7” and “ranger.” For the “bohblrn” the parameter survival_fraction is more important than the parameter random_interleave_fraction. The parameter filter_with_max_budget has barely effect for all parameter but the “knn1” learner. These are the parameters we should check more closely.

Most important Parameter for nearly all surrogate_learner is the sample parameter.

knn1: sample

plotPartialDependence(superKnn1Task, "sample", rug = FALSE, )

knn7: sample

plotPartialDependence(superKnn7Task, "sample", rug = FALSE)

bohblrn: sample

plotPartialDependence(superBohblrnTask, "sample", rug = FALSE)

ranger: sample

plotPartialDependence(superRangerTask, "sample", rug = FALSE)

We already knew that “random” is better on average but know we also know that this assumption is true for all surrogate_learner

knn1: random_interleave_fraction

plotPartialDependence(superKnn1Task, "random_interleave_fraction", plotICE = FALSE)

knn7: random_interleave_fraction

plotPartialDependence(superKnn7Task, "random_interleave_fraction", plotICE = FALSE)

bohblrn: random_interleave_fraction

plotPartialDependence(superBohblrnTask, "random_interleave_fraction", plotICE = FALSE)

ranger: random_interleave_fraction

plotPartialDependence(superRangerTask, "random_interleave_fraction", plotICE = FALSE)

For the parameter random_interleave_fraction higher values always seem to be better. For “knn1” and “knn7”, low random_interleave_fraction values seem to have a stronger negative impact on the prediction than a low value for “ranger” or “bohblrn”. For the surrogate_learner “knn1” and “bohblrn”, the maximum results in slightly worse predicted performance, but since there are few instances, this is not certain. Values between 0.75 and 0.95 can be considered optimal values for the parameter.

Another important parameter for all surrogate_learner is the survival_fraction parameter. Also, for the “bohblrn” the parameter survival_fraction was noticeably more important than for other learners. Thats why we look at this parameter next.

knn1: survival_fraction

plotPartialDependence(superKnn1Task, "survival_fraction", plotICE = FALSE)

knn7: survival_fraction

plotPartialDependence(superKnn7Task, "survival_fraction", plotICE = FALSE)

bohblrn: survival_fraction

plotPartialDependence(superBohblrnTask, "survival_fraction", plotICE = FALSE)

ranger: survival_fraction

plotPartialDependence(superRangerTask, "survival_fraction", plotICE = FALSE)

Low value for survival_fraction are better in general for the learners “knn1”, “knn7”. For knn1 a value close to 0 and for knn7 a value between 0.05 and 0.15 should be considered. For “bohblrn” values around 0.25 and 0.35 and for "ranger around 0.15 and 0.25 seems to produce best predicted performances.

The last parameter we want to check if filter_with_max_budget. It was only important for knn1 and not important for the other parameters.

knn1: filter_with_max_budget

plotPartialDependence(superKnn1Task, "filter_with_max_budget", plotICE = FALSE)

knn7: filter_with_max_budget

plotPartialDependence(superKnn7Task, "filter_with_max_budget", plotICE = FALSE)

bohblrn: filter_with_max_budget

plotPartialDependence(superBohblrnTask, "filter_with_max_budget", plotICE = FALSE)

ranger: filter_with_max_budget

plotPartialDependence(superRangerTask, "filter_with_max_budget", plotICE = FALSE)

When we compared the importance of surrogate_learner, we found that the filter_with_max_budget parameter was only important for “knn1”. Here we can see that for “knn1” the parameter filter_with_max_budget should be set to “TRUE”. For other parameters it is indeed not important if the parameter is set to “TRUE” or “FALSE”.

Top 20%

when we compared the summary of the full dataset with the top 20% configurations we could see that both, “random” and “bohb” samples were left. We also could see that mostly knn1 learner were left. To see if it is still possible to gain good results with these learner lets have a look on max values for all the learners.

summary(superBest$surrogate_learner)
## bohblrn    knn1    knn7  ranger 
##       2     546      19       2
aggregate(x = superBest$yval,                
          by = list(superBest$surrogate_learner),              
          FUN = max) 
##   Group.1          x
## 1 bohblrn -0.2117795
## 2    knn1 -0.2170470
## 3    knn7 -0.2105208
## 4  ranger -0.2124898

It is interesting to see that the best configuration of each learner, filtered out in large numbers, achieve a better yval than for the “knn1” learner. This is important because with this finding we know that it is indeed possible to achieve good results with all learners and not only with “knn1.” But “knn1” achieves the best results on average, which means that this learner is more robust and changes in configuration compared to the other learners do not have such a large negative impact on performance.

We also want to investige the best cases and for this directly check the subdivided datasets.

surrogate_learner knn1

Lets investigate knn1 a bit more. Because we have less data, we also can also make use of a Parallel Coordiante Plot.

superKnn1Best <- superBohbBest[superBohbBest$surrogate_learner == "knn1",]

superKnn1BestTask <- TaskRegr$new(id = "task", backend = superKnn1Best, target = "yval")

PCP knn1

plotParallelCoordinate(superKnn1BestTask, labelangle = 10)

Importance Plot knn1

plotImportance(superKnn1BestTask)

In the PCP it can be seen that the parameter filter_with_max_budget should set to “TRUE”, random_interleave_random to “FALSE” and random_interleave_fraction should be high for good results.

Accordint Importance Plot The paramter filter_factor_first and filter_factor_last. are very important as well and should be further examined.

knn1: PDP filter_factor_first

plotPartialDependence(superKnn1BestTask, "filter_factor_first", plotICE = FALSE)

knn1: Importance filter_factor_last

plotPartialDependence(superKnn1BestTask, "filter_factor_last", plotICE = FALSE)

In the PDP we can see that filter_factor_first should be high and fitler_factor_last has best outcome for values beteen 1.5 and 2.5 or above 6

budget_log_step

Another very important parameter for “random” Subsets and for the filtered dataset is the budget_log_step parameter. First, let us investigate the parameter with a PDP for the full dataset.

Subset bohb

plotPartialDependence(superBohbTask, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)

Subset random

plotPartialDependence(superRandomTask, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)

For the “random” Subset higher values produces better outcomes. For the superBohbTask there are two peaks around -0.5 and 0.5. To find reasons for the two peaks lets focus on the top 20% again.

top 20 %

bohb Best

plotPartialDependence(superBohbBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = FALSE)

random Best

plotPartialDependence(superRandomBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = FALSE)

Similar to the survival_fraction parameter, configurations with a low value seem to have a positive rather than negative effect on performance if the other parameters are set correctly. This could be the reason why there are two weapks for the “bohb” sample.

If we look on low values only we can see that the predicted performance varies a lot and that other parameter configurations are responsible. We choose budget_log_step values under -1.4 to get less than 150 configurations.

budgetSubset <- superRandom[superRandom$budget_log_step < -1.4,]

budgetSubsetTask <- TaskRegr$new(id = "superBohbBestTask", backend = budgetSubset, target = "yval")

plotParallelCoordinate(budgetSubsetTask, labelangle = 10)

In the PCP we can see that good values are often obtained with a “knn1” learner. A low survival_fraction is also important. The random_interleave_fraction parameter should be high instead.

Another possibiliy is to look on a two dimensional Partial Dependence Plot. We compare budget_log_step with the 3 parameter we found in the PCP.

survival_fraction

plotPartialDependence(superRandomTask, features = c("budget_log_step", "survival_fraction"), rug = FALSE, gridsize = 10)

random_interleave_fraction

plotPartialDependence(superRandomTask, features = c("budget_log_step", "random_interleave_fraction"), rug = FALSE, gridsize = 10)

random_interleave_fraction

plotPartialDependence(superRandomTask, features = c("surrogate_learner", "random_interleave_fraction"), rug = FALSE, gridsize = 10)

We can see that high values have less poor performance when other parameters are also poorly configured. Conversely, it is also possible to achieve good values when budget_log_step is low and the other parameters are well configured. We also can say that the factor “knn1” of the parametr surrogate_learner achieve best performance on average.

random_interleave_fraction

Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in both subsets and was also the most important parameter for the 20% best configurations. Therefore it is really useful to check this parameter.

bohb Subset

plotPartialDependence(superBohbTask, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)

random Subset

plotPartialDependence(superRandomTask, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)

A good choice for the parameter configuration for random_interleave_fraction of the “bohb” samples is a high value. A good range seems to be between 0.75 and 0.95, For the “random” samples a high value between 0.5 and 0.75 seems to produce best performances.

top 20%

full dataset

plotPartialDependence(superBohbBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20, plotICE = FALSE)

top 20%

plotPartialDependence(superRandomBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20, plotICE = FALSE)

The filtered dataset shows that low values doesn’t have such a bad negative impact on the outcome but high values are better. A value should be chosen over 0.5

filter_factor_last

The parameter filter_factor_last was just medicore important but a little check is good as well.

Bohb: full dataset

plotPartialDependence(superBohbTask, "filter_factor_last", plotICE = FALSE, gridsize = 40)

bohb: subdivided dataset

plotPartialDependence(superBohbBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE, gridsize = 40)

random: full dataset

plotPartialDependence(superRandomTask, "filter_factor_last", plotICE = FALSE, gridsize = 40)

random: subdivided dataset

plotPartialDependence(superRandomBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE, gridsize = 40)

Filter_factor_last has much fluctuation and therefore we choose a higher gridsize. When the fluctuations raise the importance raises as well even the range of predicted performances is not really big. the parameter value for Filter_factor_last should be between 1.5 and 2.5 or For “bohb” samples over 5.5 and for “random” samples between 5 and 5.5.

filter_with_max_budget

Bohb: full dataset

plotPartialDependence(superBohbTask, "filter_with_max_budget", rug = FALSE, plotICE = FALSE)

bohb: subdivided dataset

plotPartialDependence(superBohbBestTask, features = c("filter_with_max_budget"), rug = FALSE, plotICE = FALSE)

random: full dataset

plotPartialDependence(superRandomTask, "filter_with_max_budget", rug = FALSE, plotICE = FALSE)

random: subdivided dataset

plotPartialDependence(superRandomTask, features = c("filter_with_max_budget"), rug = FALSE, plotICE = FALSE)

The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.

filter_factor_first

This parameter had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.

Bohb: full dataset

plotPartialDependence(superBohbTask, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)

Bohb: subdivided dataset

plotPartialDependence(superBohbBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)

random: full dataset

plotPartialDependence(superRandomTask, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)

random: subdivided dataset

plotPartialDependence(superRandomBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)

The parameter filter_factor_first shows interesting differences between the general and the subdivided case. While in the general cases values above 6 are decreasing a lot in the subset these values show best performances. Since in the subset the majority of good cases are in this range it seems to be a good choice to pick a value over 6.

Comparison of the two datasets

Let us compare the results of the parameters from the two datasets

sample: The sample parameter is very important for both datasets. For the lcbench dataset it should be “bohb” in any case and for the super dataset you can get good performances with “bohb” as well as with “random”.

survival_fraction: The parametr survival_fraction should be chosen according to the selected surrogate_learner in the lcbench dataset. This distinction was made because good values could be achieved with all Learners. In particular, for the knn1 learner, which was also chosen for the super_dataset, all values should be considered, since values below 0.5 are considered to be the better results on average for the whole dataset, but for the best configurations it hardly matters and even higher values seem to be better. For the super dataset a low value between 0 and 0.3 seems is a good choice in general.

surrogate_learner: In the lcbench dataset, the surrogate_learner parameter was not particularly important, but influenced other parameters depending on the factor selected. Basically, “knn1” and “knn7” achieved the best performance values on average, but when considering only the best configurations, the surrogate_learner “bohb” achieved the best performance values on average. For the super dataset, the parameter was very important and achieved most of the good results with “knn1”. This factor should basically be the choice. However, it should also be noted that good values could be achieved with all surrogate_learner.

A very important parameter for both datasets was random_interleave_fraction. For the lcbench dataset the configuration depended on the surrogate_learner again while for the super dataset higher values led to better results.

A very important parameter for the “bohb” samples in the lcbench dataset is the budget_log_step parameter. This parameter should be set according to the surrogate_learner again but for “knn1” a value between -0.5 and 0.5 should be all right. It needs to be mentioned that this parameter had repeated dips for knn1 and knn7 so it is hard to choose the right value for this parameter. For the super dataset higher values are better, but in the top 20% of configurations, lower values achieve better yval values. Because of this problematic we chose not to limit this parameter.

For the lcbench dataset, the filter_factor_first parameter is the most important parameter for the best 20%. In general, it can be said that values below 4 provide the best performance. An exception is the bohblrn surrogate_learner. Here no restriction should take place. For the super dataset, the parameter for the best parameter configurations in combination with “knn1” values of the surrogate_learner parameter was the most important parameter. For this dataset, values above 4 seem to be a good choice for this parameter.

The filter_factor_last parameter was not really important for the lcbench dataset. The effect is small and generally should not be used to subdivide the dataset. For the super dataset, the filter_factor_last parameter was very important for the top configurations, but this was due to high fluctuations. It was difficult to restrict the parameter, but values between 4 and 5 should be assumed.

Easy to set is the filter_with_max_budget. This parameter should always be “TRUE” for both datasets.

Also the parameter filter_algorithm, filter_select_per_tournament and random_interleave_random have barely an effect and therefore do not need to be limited.